New globally distributed bacterial phyla within the FCB superphylum

Microbes in marine sediments play crucial roles in global carbon and nutrient cycling. However, our understanding of microbial diversity and physiology on the ocean floor is limited. Here, we use phylogenomic analyses of thousands of metagenome-assembled genomes (MAGs) from coastal and deep-sea sediments to identify 55 MAGs that are phylogenetically distinct from previously described bacterial phyla. We propose that these MAGs belong to 4 novel bacterial phyla (Blakebacterota, Orphanbacterota, Arandabacterota, and Joyebacterota) and a previously proposed phylum (AABM5-125-24), all of them within the FCB superphylum. Comparison of their rRNA genes with public databases reveals that these phyla are globally distributed in different habitats, including marine, freshwater, and terrestrial environments. Genomic analyses suggest these organisms are capable of mediating key steps in sedimentary biogeochemistry, including anaerobic degradation of polysaccharides and proteins, and respiration of sulfur and nitrogen. Interestingly, these genomes code for an unusually high proportion (~9% on average, up to 20% per genome) of protein families lacking representatives in public databases. Genes encoding hundreds of these protein families colocalize with genes predicted to be involved in sulfur reduction, nitrogen cycling, energy conservation, and degradation of organic compounds. Our findings advance our understanding of bacterial diversity, the ecological roles of these bacteria, and potential links between novel gene families and metabolic processes in the oceans.

and pepP (K01262) upstream. Grey color denotes the protein without homologues in the current database.
Thus, the two novel protein families which are widely present in more than 70% of the AABM5 genomes (31 curated AABM5 genomes) and rarely detected in the collection of 169,642 prokaryotic genomes.

Carbohydrate-active enzymes (CAZymes)
All genomes within these five phyla have genes encoding for potential diverse CAZymes M38 (isoaspartyl dipeptidase); M41 (FtsH peptidase) and S16 (Lon-A peptidase), mostly are ATPdependent endopeptidases; I87, inhibiting FtsH and modulating the degradation of mistranslation products that disrupt membranes 1 , is encoded in more than four copies per MAG in all these five phyla ( Supplementary Fig. 7). Orphanbacterota has 87 families of peptidases as the most diverse phylum, while Joyebacterota has 71 families which is the least diverse phylum.
24 out of 122 classified families/subfamilies were exclusively found in a single phylum, and those unique families/subfamilies are mostly present in less than 50% MAGs in that phylum with fewer gene copy number than the rest of types of peptidases. In addition, none of genes in those unique families/subfamilies has the signal of secretion.
A total of 185 genes, distributed in 14 families/subfamilies, appear to be potentially extracellular. The only extracellular family/subfamily consistently found in the five phyla is the serine endopeptidase subtilisin (S08A), widely distributed in eukaryotes and prokaryotes 2,3 .
Blakebacterota, Orphanbacterota, and Arandabacterota have many potential genes identified as extracellular peptidases belonging to family M28 (subfamilies M28C, M28E, and M28F). These three subfamilies are mainly Streptomyces-type aminopeptidases and carboxypeptidases, releasing basic amino acids and C-terminal glutamates, respectively. Joyebacterota has different types of potential extracellular peptidase compared with other phyla. Joyebacterota has six sequences with signals of potential secretion belonging to family C01A (papain), a heat-resistant enzyme with an optimal temperature range of 60 to 70 °C 4 . This may highlight the significant role of Joyebacerota in the extracellular degradation of proteins in the hydrothermal vent area.

Central metabolism
Glycolysis: MAGs from Orphanbacterota, Arandabacterota, and Joyebacterota encode most of the key genes for glycolysis ( Supplementary Fig. 5). Interestingly, the gene encoding for pyruvate kinase was not found in AABM5 and Blakebacterota, suggesting that these two phyla may not have the complete pathway of glycolysis for energy production. In addition, less than half of  The synthetic pathway for formate assimilation 12 , the reductive glycine pathway (rGlyP) 12 is found across MAGs from the five phyla. The glycine cleavage system (GCS), catalyzing the reversible conversion of CO2, methenyl-THF, and ammonia to glycine and tetrahydrofolate (THF), is annotated in all the five phyla. We explore the possibility of CO oxidation in these bacterial genomes, yet none of the MAG encode the three subunits of the coxLMS complex and only one MAG in Blakebacterota has the large subunit of aerobic CO dehydrogenase (coxL).
TCA cycle: All the five phyla have the complete set of genes for the TCA cycle ( Supplementary   Fig. 5 ubiquinol oxidase, suggesting that they may only use the oxygen as the electron acceptor efficiently when the oxygen concentration is low to scavenge oxygen as a way of protection rather than for energy production. Additionally, we identified the acetogenic type Rhodobacter nitrogen fixation (Rnf) electron transport complex in four among these five phyla (Blakebacterota, Orphanbacterota, Arandabacterota, and Joyebacterota) ( Supplementary Fig. 5). This complex could serve as a respiratory enzyme that couples the reduction of NAD + to oxidize reduced ferredoxin. The free energy of this exergonic reaction could be used to pump sodium ions or protons out of cells, thereby generating a potential gradient, which is further used for ATP synthesis. Those MAGs with Rnf complex genes only have partial WLP (see details in the WLP section), yet they all have the reductive glycine pathway (rGlyP) for formate assimilation.
Lipids: Genes encoding for enzymes which are responsible for the transport, activation, and cleavage of fatty acids through beta oxidation 15 are found in all phyla (e.g., genes encoding for acyl-CoA dehydrogenase and enoyl-CoA hydratase are commonly found in these five phyla).
However, only two Orphanbacterota MAGs appear to have the complete pathway for beta oxidation (i.e., they encode for the 3-hydroxyacyl-CoA dehydrogenase and acetyl-CoA acyltransferase). This suggests a limited ability of fatty acid degradation in these five phyla.
Amino acids: Different types of amino acids, including alanine, serine, asparagine, histidine, and lysine could be degraded via central metabolism (i.e., TCA cycle and pyruvate metabolism) which are also the key pathways for energy production in most phyla ( Supplementary Fig. 5). In addition, genes encoding for glutamine synthetase, glutamate synthase, and glutamate dehydrogenase are commonly annotated in these five phyla suggesting active anabolic and catabolic metabolisms through glutamate, glutamine, and ammonium. However, all phyla lack the genes encoding key enzymes for the degradation of branched-chain amino acids and aromatic amino acids, e.g., valine, leucine, isoleucine, and tyrosine.

Sulfur metabolism
Marine sediments are active sites for sulfur and nitrogen cycling. Detailed genome-specific metabolic potential reveals that all these five phyla are capable of using different sulfur and nitrogen compounds as energy sources (Fig. 4)  Most AABM5 MAGs encode the sulfate permease, while it is observed in only a few Orphanbacterota, Arandabacterota MAGs. In addition, only one AABM5 MAG has most genes encoding for sulfate/thiosulfate transport system (CysAUWP), except that the subunit CysP is missing, which is required for sulfate/thiosulfate ABC transporter 28 .

Nitrogen metabolism
These five phyla encode both the oxidative and reductive pathways of the nitrogen cycle ( Supplementary Fig. 5). Most of the MAGs in these five phyla encode genes for glutamine synthetase and glutamate dehydrogenase for ammonia assimilation.

Hydrogen metabolism
Hydrogen production or consumption through hydrogenase is thought to be crucial in energy cycling in both coastal and hydrothermal environments 29,30 . All of these five phyla have different types of hydrogenases indicating the diverse hydrogen metabolism which could supply intracellular reducing equivalents to further couple different metabolic pathways in these bacteria 31,32 . Most of the hydrogenases are found from hydrothermal sediments and cold seep sediments, especially that [FeFe] hydrogenases are only found in the Guaymas Basin samples in this study. The phylogenetic position of each type of hydrogenase is consistent with the phylogeny of these five phyla ( Fig. 3 and Supplementary Fig. 9).
F420-non-reducing hydrogenase (MvhADG), belonging to type 3c [NiFe] hydrogenase, is distributed in all these five phyla, especially that in more than half MAGs in Arandabacterota and close to half MAGs in Joyebacterota. This F420-non-reducing hydrogenase provides reducing equivalents to the heterodisulfide reductase without reacting with F420, i.e., transporting electrons using H2 as electron donor 33  Orphanbacterota is potentially involved in mercury detoxification 47 .
Selenium could exist in both organic and inorganic with different oxidation states in marine sediments 48 , and selenocysteine is the 21st amino acid 49  Meanwhile, selenium could also be released into selenide from selenocysteine during the biosynthesis.
Arsenate and arsenite are the two dominant formats of inorganic arsenic in marine environments 55 . They induce toxicity by blocking general cell metabolism 56 . All these five phyla have genes for the arsenic detoxification system. MAGs in all five phyla, though less than half MAGs in Arandabacterota, and Joyebacterota, are capable of reducing arsenate to arsenite via arsenate reductase (ArsC) through thioredoxin 57 . Even though arsenite is more toxic than arsenate, arsenite could be extruded from the cell by arsenite transporter (ArsAB) or transformed to methyl arsonate, less toxic than the inorganic format 58 by arsenite methyltransferase (AS3MT). The ArsC could also be a potential pathway for energy production by using arsenate as the terminal electron acceptor with sulfide or lactate as the electron donor 59,60 , rather than merely a way of detoxification.

Active interaction with environments
All the five phyla have genes encoding for diverse transport systems including the importers for different substrates from small ions to large proteins for energy conservation. They also display exporters for detoxification or resistance of antibiotics, indicating an active exchange with the surrounding environments. A putative ATP-binding cassette (ABC) multiple sugar transport system is annotated in all five phyla ( Supplementary Fig. 5), especially in most MAGs in Orphanbacterota, Arandabacterota, and Joyebacterota. Interestingly, the ABC transporter for specific oligosaccharide is rare, for example the specific transporter for the maltose, cellobiose, lactose, xylobiose, and etc., are missing in all phyla. The general nucleoside transport system (NupABC), transporting all common nucleosides across the membrane 61 , is annotated in most MAGs in Orphanbacterota, Arandabacterota, and Joyebacterota. In addition, the ribose transporter is found in a few MAGs in AABM5 and Blakebacterota. Interestingly, the transporter for the rest monosaccharide, e.g., glucose, arabinose, galactose, xylose, and fucose are missing in all MAGs within the five phyla. Furthermore, very few genes related to the phosphotransferase system are annotated. Thus, we suggest the uptake of carbohydrates from the environment in these phyla is through the putative ABC multiple sugar transport system. The peptide could also be transported into the cell through the peptide/nickel transport system as the additional source of organic carbon.
The family is described based on a 37 concatenated conserved marker gene phylogeny. Type genus is Candidatus Arandabacterum.
The order is described based on a 37 concatenated conserved marker gene phylogeny. Type genus is Candidatus Arandabacterum.
The class is described based on a 37 concatenated conserved marker gene phylogeny. Type genus is Candidatus Arandabacterum.
The phylum is described based on a 37 concatenated conserved marker gene phylogeny. Type genus is Candidatus Arandabacterum. The family is described based on a 37 concatenated conserved marker gene phylogeny. Type genus is Candidatus Joyebacterum.
The order is described based on a 37 concatenated conserved marker gene phylogeny. Type genus is Candidatus Joyebacterum.
The class is described based on a 37 concatenated conserved marker gene phylogeny. Type genus is Candidatus Joyebacterum.
The phylum is described based on a 37 concatenated conserved marker gene phylogeny. Type genus is Candidatus Joyebacterum.